{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LlamaIndex Bottoms-Up Development - Evaluation Baseline\n",
    "\n",
    "LlamaIndex provides some basic evaluation of query engines! We can setup an evaluator that will measure both hallucinations, as well as if the query was actually answered!\n",
    "\n",
    "This is provided by two main evaluations:\n",
    "\n",
    "- `ResponseSourceEvaluator` - uses an LLM to decide if the response is similar enough to the sources -- a good measure for hallunication detection!\n",
    "- `QueryResponseEvaluator` - uses an LLM to decide if a response is similar enough to the original query -- a good measure for checking if the query was answered!\n",
    "\n",
    "You may have noticed that we are using an LLM for this task. That means we will want to pick a powerful LLM, like GPT-4 or Claude-2.\n",
    "\n",
    "Lastly, using these methods, we can also use the LLM to generate syntheic questions to evaluate with!"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup the Baseline Query Engine"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading our Docs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"API_KEY_HERE\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "sys.path.append(os.path.join(os.getcwd(), '..'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_docs_bot.markdown_docs_reader import MarkdownDocsReader\n",
    "from llama_index import SimpleDirectoryReader\n",
    "\n",
    "def load_markdown_docs(filepath):\n",
    "    \"\"\"Load markdown docs from a directory, excluding all other file types.\"\"\"\n",
    "    loader = SimpleDirectoryReader(\n",
    "        input_dir=filepath, \n",
    "        required_exts=[\".md\"],\n",
    "        file_extractor={\".md\": MarkdownDocsReader()},\n",
    "        recursive=True\n",
    "    )\n",
    "\n",
    "    documents = loader.load_data()\n",
    "\n",
    "    # exclude some metadata from the LLM\n",
    "    for doc in documents:\n",
    "        doc.excluded_llm_metadata_keys = [\"File Name\", \"Content Type\", \"Header Path\"]\n",
    "\n",
    "    return documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load our documents from each folder.\n",
    "# we keep them seperate for now, in order to create seperate indexes later\n",
    "getting_started_docs = load_markdown_docs(\"../docs/getting_started\")\n",
    "community_docs = load_markdown_docs(\"../docs/community\")\n",
    "data_docs = load_markdown_docs(\"../docs/core_modules/data_modules\")\n",
    "agent_docs = load_markdown_docs(\"../docs/core_modules/agent_modules\")\n",
    "model_docs = load_markdown_docs(\"../docs/core_modules/model_modules\")\n",
    "query_docs = load_markdown_docs(\"../docs/core_modules/query_modules\")\n",
    "supporting_docs = load_markdown_docs(\"../docs/core_modules/supporting_modules\")\n",
    "tutorials_docs = load_markdown_docs(\"../docs/end_to_end_tutorials\")\n",
    "contributing_docs = load_markdown_docs(\"../docs/development\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the indicies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import ServiceContext, set_global_service_context\n",
    "from llama_index.llms import OpenAI\n",
    "\n",
    "# create a global service context\n",
    "service_context = ServiceContext.from_defaults(llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0))\n",
    "set_global_service_context(service_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage\n",
    "\n",
    "# create a vector store index for each folder\n",
    "try:\n",
    "    getting_started_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./getting_started_index\"))\n",
    "    community_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./community_index\"))\n",
    "    data_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./data_index\"))\n",
    "    agent_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./agent_index\"))\n",
    "    model_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./model_index\"))\n",
    "    query_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./query_index\"))\n",
    "    supporting_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./supporting_index\"))\n",
    "    tutorials_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./tutorials_index\"))\n",
    "    contributing_index = load_index_from_storage(StorageContext.from_defaults(persist_dir=\"./contributing_index\"))\n",
    "except:\n",
    "    getting_started_index = VectorStoreIndex.from_documents(getting_started_docs)\n",
    "    getting_started_index.storage_context.persist(persist_dir=\"./getting_started_index\")\n",
    "\n",
    "    community_index = VectorStoreIndex.from_documents(community_docs)\n",
    "    community_index.storage_context.persist(persist_dir=\"./community_index\")\n",
    "\n",
    "    data_index = VectorStoreIndex.from_documents(data_docs)\n",
    "    data_index.storage_context.persist(persist_dir=\"./data_index\")\n",
    "\n",
    "    agent_index = VectorStoreIndex.from_documents(agent_docs)\n",
    "    agent_index.storage_context.persist(persist_dir=\"./agent_index\")\n",
    "\n",
    "    model_index = VectorStoreIndex.from_documents(model_docs)\n",
    "    model_index.storage_context.persist(persist_dir=\"./model_index\")\n",
    "\n",
    "    query_index = VectorStoreIndex.from_documents(query_docs)\n",
    "    query_index.storage_context.persist(persist_dir=\"./query_index\")    \n",
    "\n",
    "    supporting_index = VectorStoreIndex.from_documents(supporting_docs)\n",
    "    supporting_index.storage_context.persist(persist_dir=\"./supporting_index\")\n",
    "\n",
    "    tutorials_index = VectorStoreIndex.from_documents(tutorials_docs)\n",
    "    tutorials_index.storage_context.persist(persist_dir=\"./tutorials_index\")\n",
    "\n",
    "    contributing_index = VectorStoreIndex.from_documents(contributing_docs)\n",
    "    contributing_index.storage_context.persist(persist_dir=\"./contributing_index\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Query Engine Tools\n",
    "\n",
    "Since we have so many indicies, we can create a query engine tool for each and then use them in a single query engine!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.tools import QueryEngineTool\n",
    "\n",
    "# create a query engine tool for each folder\n",
    "getting_started_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=getting_started_index.as_query_engine(), \n",
    "    name=\"Getting Started\", \n",
    "    description=\"Useful for answering questions about installing and running llama index, as well as basic explanations of how llama index works.\"\n",
    ")\n",
    "\n",
    "community_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=community_index.as_query_engine(),\n",
    "    name=\"Community\",\n",
    "    description=\"Useful for answering questions about integrations and other apps built by the community.\"\n",
    ")\n",
    "\n",
    "data_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=data_index.as_query_engine(),\n",
    "    name=\"Data Modules\",\n",
    "    description=\"Useful for answering questions about data loaders, documents, nodes, and index structures.\"\n",
    ")\n",
    "\n",
    "agent_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=agent_index.as_query_engine(),\n",
    "    name=\"Agent Modules\",\n",
    "    description=\"Useful for answering questions about data agents, agent configurations, and tools.\"\n",
    ")\n",
    "\n",
    "model_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=model_index.as_query_engine(),\n",
    "    name=\"Model Modules\",\n",
    "    description=\"Useful for answering questions about using and configuring LLMs, embedding modles, and prompts.\"\n",
    ")\n",
    "\n",
    "query_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=query_index.as_query_engine(),\n",
    "    name=\"Query Modules\",\n",
    "    description=\"Useful for answering questions about query engines, query configurations, and using various parts of the query engine pipeline.\"\n",
    ")\n",
    "\n",
    "supporting_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=supporting_index.as_query_engine(),\n",
    "    name=\"Supporting Modules\",\n",
    "    description=\"Useful for answering questions about supporting modules, such as callbacks, service context, and avaluation.\"\n",
    ")\n",
    "\n",
    "tutorials_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=tutorials_index.as_query_engine(),\n",
    "    name=\"Tutorials\",\n",
    "    description=\"Useful for answering questions about end-to-end tutorials and giving examples of specific use-cases.\"\n",
    ")\n",
    "\n",
    "contributing_tool = QueryEngineTool.from_defaults(\n",
    "    query_engine=contributing_index.as_query_engine(),\n",
    "    name=\"Contributing\",\n",
    "    description=\"Useful for answering questions about contributing to llama index, including how to contribute to the codebase and how to build documentation.\"\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Unified Query Engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "# needed for notebooks\n",
    "import nest_asyncio\n",
    "nest_asyncio.apply()\n",
    "\n",
    "from llama_index.query_engine import SubQuestionQueryEngine\n",
    "from llama_index.response_synthesizers import get_response_synthesizer\n",
    "\n",
    "query_engine = SubQuestionQueryEngine.from_defaults(\n",
    "    query_engine_tools=[\n",
    "        getting_started_tool,\n",
    "        community_tool,\n",
    "        data_tool,\n",
    "        agent_tool,\n",
    "        model_tool,\n",
    "        query_tool,\n",
    "        supporting_tool,\n",
    "        tutorials_tool,\n",
    "        contributing_tool\n",
    "    ],\n",
    "    # enable this for streaming\n",
    "    # response_synthesizer=get_response_synthesizer(streaming=True),\n",
    "    verbose=False\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test the Query Engine!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "To install LlamaIndex, you can follow these steps:\n",
      "\n",
      "1. Install the package using pip by running the command: `pip install llama-index`.\n",
      "2. Clone the repository using Git by running the command: `git clone https://github.com/jerryjliu/llama_index.git`.\n",
      "3. If you want to do an editable install of just the package itself, run the command: `pip install -e .`.\n",
      "4. If you want to install optional dependencies and dependencies used for development, run the command: `pip install -r requirements.txt`.\n",
      "\n",
      "Please note that these steps assume you have pip and Git installed on your system.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"How do I install llama index?\")\n",
    "print(str(response))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate the Basline!\n",
    "\n",
    "Now that we have our baseline query engine created, we can create a basic evaluation pipeline!\n",
    "\n",
    "Our pipeline will:\n",
    "\n",
    "- Generate a small dataset of questions\n",
    "- Save/cache these questions (so we can properly compare performance later!)\n",
    "- Evaluate both response quality and hallucination\n",
    "\n",
    "To do this reliably, we need to use an LLM smarter than `gpt-3.5-turbo`, so we will setup `gpt-4` for the evaluation process!"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate the Dataset\n",
    "\n",
    "In order to make the question generation more effecient, we can remove small documents and combine all documents into a giant single docoument.\n",
    "\n",
    "I also modify the question generation prompt, to generate a single question for each chunk, along with extra context for what it is reading."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import Document\n",
    "\n",
    "documents = SimpleDirectoryReader(\"../docs\", recursive=True, required_exts=[\".md\"]).load_data()\n",
    "\n",
    "all_text = \"\"\n",
    "\n",
    "for doc in documents:\n",
    "    all_text += doc.text\n",
    "\n",
    "giant_document = Document(text=all_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "random.seed(42)\n",
    "\n",
    "from llama_index import ServiceContext\n",
    "from llama_index.prompts import Prompt\n",
    "from llama_index.llms import OpenAI\n",
    "from llama_index.evaluation import DatasetGenerator\n",
    "\n",
    "gpt4_service_context = ServiceContext.from_defaults(llm=OpenAI(llm=\"gpt-4\", temperature=0))\n",
    "\n",
    "question_dataset = []\n",
    "if os.path.exists(\"question_dataset.txt\"):\n",
    "    with open(\"question_dataset.txt\", \"r\") as f:\n",
    "        for line in f:\n",
    "            question_dataset.append(line.strip())\n",
    "else:\n",
    "    # generate questions\n",
    "    data_generator = DatasetGenerator.from_documents(\n",
    "        [giant_document],\n",
    "        text_question_template=Prompt(\n",
    "            \"A sample from the LlamaIndex documentation is below.\\n\"\n",
    "            \"---------------------\\n\"\n",
    "            \"{context_str}\\n\"\n",
    "            \"---------------------\\n\"\n",
    "            \"Using the documentation sample, carefully follow the instructions below:\\n\"\n",
    "            \"{query_str}\"\n",
    "        ),\n",
    "        question_gen_query=(\n",
    "            \"You are an evaluator for a search pipeline. Your task is to write a single question \"\n",
    "            \"using the provided documentation sample above to test the search pipeline. The question should \"\n",
    "            \"reference specific names, functions, and terms. Restrict the question to the \"\n",
    "            \"context information provided.\\n\"\n",
    "            \"Question: \"\n",
    "        ),\n",
    "        # set this to be low, so we can generate more questions\n",
    "        service_context=gpt4_service_context\n",
    "    )\n",
    "    generated_questions = data_generator.generate_questions_from_nodes()\n",
    "\n",
    "    # randomly pick 40 questions from each dataset\n",
    "    generated_questions = random.sample(generated_questions, 40)\n",
    "    question_dataset.extend(generated_questions)\n",
    "\n",
    "    print(f\"Generated {len(question_dataset)} questions.\")\n",
    "\n",
    "    # save the questions!\n",
    "    with open(\"question_dataset.txt\", \"w\") as f:\n",
    "        for question in question_dataset:\n",
    "            f.write(f\"{question.strip()}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['What is the function used to specify the metadata visible to the embedding model and how can it be customized?', 'How can I convert tools to LangChain tools using the provided documentation sample?', 'What is the purpose of the \"router query engine\" in the LlamaIndex framework?', 'What are the different vector stores supported by LlamaIndex for use as the storage backend for `VectorStoreIndex`?', 'What is the default number of LLM calls required for the ListIndex?']\n"
     ]
    }
   ],
   "source": [
    "print(random.sample(question_dataset, 5))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate with the Dataset\n",
    "\n",
    "Now that we have our dataset, let's measure performance!"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluating Response for Hallucination"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import asyncio\n",
    "import nest_asyncio\n",
    "nest_asyncio.apply()\n",
    "\n",
    "from llama_index import Response\n",
    "\n",
    "def evaluate_query_engine(evaluator, query_engine, questions):\n",
    "    async def run_query(query_engine, q):\n",
    "        try:\n",
    "            return await query_engine.aquery(q)\n",
    "        except:\n",
    "            return Response(response=\"Error, query failed.\")\n",
    "\n",
    "    total_correct = 0\n",
    "    all_results = []\n",
    "    for batch_size in range(0, len(questions), 5):\n",
    "        batch_qs = questions[batch_size:batch_size+5]\n",
    "\n",
    "        tasks = [run_query(query_engine, q) for q in batch_qs]\n",
    "        responses = asyncio.run(asyncio.gather(*tasks))\n",
    "        print(f\"finished batch {(batch_size // 5) + 1} out of {len(questions) // 5}\")\n",
    "\n",
    "        for response in responses:\n",
    "            eval_result = 1 if \"YES\" in evaluator.evaluate(response) else 0\n",
    "            total_correct += eval_result\n",
    "            all_results.append(eval_result)\n",
    "        \n",
    "        # helps avoid rate limits\n",
    "        time.sleep(1)\n",
    "\n",
    "    return total_correct, all_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finished batch 1 out of 8\n",
      "finished batch 2 out of 8\n",
      "finished batch 3 out of 8\n",
      "finished batch 4 out of 8\n",
      "finished batch 5 out of 8\n",
      "finished batch 6 out of 8\n",
      "finished batch 7 out of 8\n",
      "finished batch 8 out of 8\n",
      "Hallucination? Scored 29 out of 40 questions correctly.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.evaluation import ResponseEvaluator\n",
    "\n",
    "# gpt-4 evaluator!\n",
    "evaluator = ResponseEvaluator(service_context=gpt4_service_context)\n",
    "\n",
    "total_correct, all_results = evaluate_query_engine(evaluator, query_engine, question_dataset)\n",
    "\n",
    "print(f\"Hallucination? Scored {total_correct} out of {len(question_dataset)} questions correctly.\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Investigating Hallucinations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['What is the purpose of the `GuidancePydanticProgram` class in the LlamaIndex documentation?'\n",
      " 'What is the purpose of the SubQuestionQueryEngine class in LlamaIndex?'\n",
      " 'What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?'\n",
      " 'What are the different vector stores supported by LlamaIndex for use as the storage backend for `VectorStoreIndex`?'\n",
      " 'What is the purpose of the `RefinePrompt` class in the LlamaIndex documentation?'\n",
      " 'What is the purpose of the Algovera tool built on top of LlamaIndex?'\n",
      " 'What are the three primary sections within the layout of the ChatView component?'\n",
      " 'What is the purpose of the `insert_terms` function in the LlamaIndex app?'\n",
      " 'What is the purpose of the `load_collection_model` function in the LlamaIndex documentation?'\n",
      " 'What is the purpose of the SQLTableNodeMapping object in the LlamaIndex documentation sample?'\n",
      " 'What is the purpose of the `RouterQueryEngine` in LlamaIndex and how can it be used in the search pipeline?']\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "hallucinated_questions = np.array(question_dataset)[np.array(all_results) == 0]\n",
    "print(hallucinated_questions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the given context information, there is no specific mention of the \"GuidancePydanticProgram\" class in the LlamaIndex documentation. Therefore, it is not possible to determine the purpose of this class without further information.\n",
      "-----------------\n",
      "> Source (Doc id: bbf2dedc-90fb-49e8-be57-28a2783ea7c1): Sub question: What is the purpose of the GuidancePydanticProgram class?\n",
      "Response: Based on the given context information, there is no specific mention of the \"GuidancePydanticProgram\" class. Therefore, it is not possible to determine the purpose of this class without further information....\n",
      "\n",
      "> Source (Doc id: b393e1a2-75d2-4017-bd12-d3b536aa3c4a): Sub question: How do I install and run LlamaIndex?\n",
      "Response: To install LlamaIndex, you can use the pip package manager by running the command \"pip install llama-index\" in your terminal or command prompt.\n",
      "\n",
      "After installing LlamaIndex, you can use the following starter example to run it:\n",
      "\n",
      "1. Import the LlamaIndex module in your Python script:\n",
      "   ```python\n",
      "   import llama_index\n",
      "   ```\n",
      "\n",
      "2. Use the LlamaIndex functions and classes as needed in your script.\n",
      "\n",
      "Make sure you have Python and pip installed on your system before running the installation command....\n",
      "\n",
      "> Source (Doc id: a9c75753-964e-4fb3-9798-b6b8ba6c044b): Sub question: What are the basic explanations of how LlamaIndex works?\n",
      "Response: LlamaIndex is a platform that allows you to create LLM-powered applications using custom data. The platform follows the retrieval augmented generation (RAG) paradigm, which combines LLM (Language Model) with custom data. \n",
      "\n",
      "The basic explanation of how LlamaIndex works is by using LLM to generate responses based on the given input. It does this by retrieving relevant information from the custom data and then using LLM to generate a response that is relevant to the input.\n",
      "\n",
      "To compose your own RAG pipeline in LlamaIndex, you need to understand key concepts and modules. These include understanding how to retrieve information from the custom data, how to use LLM for generation, and how to combine these components effectively to create your desired application.\n",
      "\n",
      "Overall, LlamaIndex provides a framework for building LLM-powered applications by combining LLM with custom data, allowing you to create applications...\n",
      "\n",
      "> Source (Doc id: 60e1c688-4acc-428b-9282-240d0c7092df): Sub question: Are there any integrations or other apps built by the community for LlamaIndex?\n",
      "Response: Yes, there are integrations and other apps built by the community for LlamaIndex....\n",
      "\n",
      "> Source (Doc id: 8880918b-13f3-4695-b805-bb2d99df393b): Sub question: What are data loaders, documents, nodes, and index structures in LlamaIndex?\n",
      "Response: In LlamaIndex, data loaders are components that are responsible for ingesting and loading external data into the system. They handle the process of converting the external data into LlamaIndex's internal representation, which consists of documents and nodes.\n",
      "\n",
      "Documents in LlamaIndex refer to the ingested data, which can be text or any other form of content. These documents are internally parsed and chunked into smaller units called nodes. Nodes represent chunks of text from the documents and serve as the basic building blocks for indexing and querying.\n",
      "\n",
      "Index structures in LlamaIndex are the data structures used to organize and store the index metadata. The index metadata includes information about the nodes, such as their location and properties, which enables efficient retrieval and querying of the data. LlamaIndex supports customizable index stores, allowing users to choose the st...\n",
      "\n",
      "> Source (Doc id: adb3de98-42f7-4ea9-8c53-67c14aa30000): Sub question: How do I work with data agents, agent configurations, and tools in LlamaIndex?\n",
      "Response: To work with data agents, agent configurations, and tools in LlamaIndex, you need to understand the core components and functionalities involved.\n",
      "\n",
      "Data agents in LlamaIndex are knowledge workers powered by LLM (Llama Language Model). They can perform various tasks on your data, both in a \"read\" and \"write\" function. They are capable of automated search and retrieval of different types of data, including unstructured, semi-structured, and structured data. Additionally, they can call external service APIs in a structured manner and process the response, storing it for later use.\n",
      "\n",
      "To build a data agent, you need two core components: a reasoning loop and tool abstractions. The reasoning loop helps the agent decide which tools to use, in what sequence, and with what parameters, based on the input task. The tool abstractions represent the APIs or tools that the agent can interact with. T...\n",
      "\n",
      "> Source (Doc id: e274a384-c44f-429a-9f4b-044b7f80ed2f): Sub question: What is the purpose of LLMs, embedding models, and prompts in LlamaIndex?\n",
      "Response: The purpose of LLMs (Large Language Models) in LlamaIndex is to provide expressive power and enhance the functionality of the framework. LLMs can be used as standalone modules or integrated into other core LlamaIndex components. They are primarily used during the response synthesis step, but depending on the type of index being used, they may also be utilized during index construction, insertion, and query traversal.\n",
      "\n",
      "Embedding models are used in LlamaIndex to generate embeddings or representations of text data. These embeddings can be used for various tasks such as similarity matching, clustering, or classification. Embedding models help in organizing and structuring the data within LlamaIndex.\n",
      "\n",
      "Prompts play a crucial role in LlamaIndex as they are used for various purposes. They are used to build the index, perform insertion, traverse queries, and synthesize the final answer. LlamaInd...\n",
      "\n",
      "> Source (Doc id: 5c3e397e-95e1-4a9a-8bf6-0c6a1d7f3589): Sub question: How do I use query engines, query configurations, and the query engine pipeline in LlamaIndex?\n",
      "Response: To use query engines, query configurations, and the query engine pipeline in LlamaIndex, you need to follow these steps:\n",
      "\n",
      "1. Import the necessary modules from the LlamaIndex library. In this case, you need to import the `VectorStoreIndex`, `get_response_synthesizer`, `VectorIndexRetriever`, and `RetrieverQueryEngine` modules.\n",
      "\n",
      "2. Create an instance of the `VectorStoreIndex` class. This class represents the index structure that you want to perform queries on.\n",
      "\n",
      "3. Create an instance of the `VectorIndexRetriever` class, passing the `VectorStoreIndex` instance as a parameter. This class is responsible for retrieving responses from the index.\n",
      "\n",
      "4. Create an instance of the `RetrieverQueryEngine` class, passing the `VectorIndexRetriever` instance as a parameter. This class represents the query engine that will execute the queries and retrieve the responses.\n",
      "\n",
      "5. Configure t...\n",
      "\n",
      "> Source (Doc id: 5c1412b2-165a-45f8-aa66-019bbae04d35): Sub question: What are the supporting modules in LlamaIndex, such as callbacks, service context, and evaluation?\n",
      "Response: The supporting modules in LlamaIndex include callbacks, service context, and evaluation....\n",
      "\n",
      "> Source (Doc id: 66e6e99d-20d9-48f5-b41c-2ab714af20a1): Sub question: Are there any end-to-end tutorials or examples of specific use-cases for LlamaIndex?\n",
      "Response: Based on the given context information, it is not explicitly mentioned whether there are any end-to-end tutorials or examples of specific use-cases for LlamaIndex....\n",
      "\n",
      "> Source (Doc id: bae0667b-f1e5-4303-906a-2ff1c823181a): Sub question: How can I contribute to LlamaIndex, including contributing to the codebase and building documentation?\n",
      "Response: To contribute to LLamaIndex, you can start by contributing to its codebase. This can be done by forking the LLamaIndex repository on a platform like GitHub, making your desired changes or additions to the code, and then submitting a pull request to the main LLamaIndex repository. The LLamaIndex team will review your changes and, if approved, merge them into the codebase.\n",
      "\n",
      "In addition to code contributions, you can also contribute to LLamaIndex by building documentation. This can involve creating or updating documentation that helps users understand how to use LLamaIndex, its features, and its configuration options. You can contribute to the documentation by submitting pull requests to the LLamaIndex documentation repository.\n",
      "\n",
      "By contributing to the codebase and building documentation, you can help improve LLamaIndex and make it more accessible and useful for...\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query('What is the purpose of the `GuidancePydanticProgram` class in the LlamaIndex documentation?')\n",
    "print(str(response))\n",
    "print(\"-----------------\")\n",
    "print(response.get_formatted_sources(length=1000))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The purpose of the `RouterQueryEngine` in LlamaIndex is to perform query transformations over index structures. It is responsible for converting a query into another query, either in a single-step or multi-step process. Additionally, the `RouterQueryEngine` supports streaming the response as it is being generated, allowing for the processing or printing of the beginning of the response before the full response is finished, thereby reducing the perceived latency of queries.\n",
      "\n",
      "As for how the `RouterQueryEngine` can be used in the search pipeline, the given context information does not provide any specific details. Therefore, without prior knowledge, it is not possible to determine the exact usage of the `RouterQueryEngine` in the search pipeline.\n",
      "-----------------\n",
      "> Source (Doc id: b1fba297-c258-4d6f-9f18-205177456c59): Sub question: What is the purpose of the `RouterQueryEngine` in LlamaIndex?\n",
      "Response: Based on the given context information, the purpose of the `RouterQueryEngine` in LlamaIndex is to perform query transformations over index structures. It is responsible for converting a query into another query, either in a single-step or multi-step process. The `RouterQueryEngine` also supports streaming the response as it is being generated, allowing for the processing or printing of the beginning of the response before the full response is finished, thereby reducing the perceived latency of queries....\n",
      "\n",
      "> Source (Doc id: 8cefb2af-73d6-4edf-baab-13651453ea56): Sub question: How can the `RouterQueryEngine` be used in the search pipeline?\n",
      "Response: The given context information does not provide any specific details about the `RouterQueryEngine`. Therefore, without prior knowledge, it is not possible to determine how the `RouterQueryEngine` can be used in the search pipeline....\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query('What is the purpose of the `RouterQueryEngine` in LlamaIndex and how can it be used in the search pipeline?')\n",
    "print(str(response))\n",
    "print(\"-----------------\")\n",
    "print(response.get_formatted_sources(length=1000))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluating Response for Answer Quality"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import asyncio\n",
    "import nest_asyncio\n",
    "nest_asyncio.apply()\n",
    "from llama_index import Response\n",
    "\n",
    "def evaluate_query_engine(evaluator, query_engine, questions):\n",
    "    async def run_query(query_engine, q):\n",
    "        try:\n",
    "            return await query_engine.aquery(q)\n",
    "        except:\n",
    "            return Response(response=\"Error, query failed.\")\n",
    "\n",
    "    total_correct = 0\n",
    "    all_results = []\n",
    "    for batch_size in range(0, len(questions), 5):\n",
    "        batch_qs = questions[batch_size:batch_size+5]\n",
    "\n",
    "        tasks = [run_query(query_engine, q) for q in batch_qs]\n",
    "        responses = asyncio.run(asyncio.gather(*tasks))\n",
    "        print(f\"finished batch {(batch_size // 5) + 1} out of {len(questions) // 5}\")\n",
    "\n",
    "        for question, response in zip(batch_qs, responses):\n",
    "            eval_result = 1 if \"YES\" in evaluator.evaluate(question, response) else 0\n",
    "            total_correct += eval_result\n",
    "            all_results.append(eval_result)\n",
    "        \n",
    "        # helps avoid rate limits\n",
    "        time.sleep(1)\n",
    "\n",
    "    return total_correct, all_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "finished batch 1 out of 8\n",
      "finished batch 2 out of 8\n",
      "finished batch 3 out of 8\n",
      "finished batch 4 out of 8\n",
      "finished batch 5 out of 8\n",
      "finished batch 6 out of 8\n",
      "finished batch 7 out of 8\n",
      "finished batch 8 out of 8\n",
      "Response satisfies the query? Scored 19 out of 40 questions correctly.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.evaluation import QueryResponseEvaluator\n",
    "\n",
    "evaluator = QueryResponseEvaluator(service_context=gpt4_service_context)\n",
    "\n",
    "total_correct, all_results = evaluate_query_engine(evaluator, query_engine, question_dataset)\n",
    "\n",
    "print(f\"Response satisfies the query? Scored {total_correct} out of {len(question_dataset)} questions correctly.\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Investigating Incorrect Answers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['What is the purpose of the `GuidancePydanticProgram` class in the LlamaIndex documentation?'\n",
      " 'What is the purpose of the SubQuestionQueryEngine class in LlamaIndex?'\n",
      " 'What are the available options for the storage backend of the index store in LlamaIndex?'\n",
      " 'What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?'\n",
      " \"What is the purpose of the `CollectionQueryConsumer` class in the Delphic application's WebSocket handling?\"\n",
      " 'What is the purpose of the `ReActAgent` and how can it be initialized with other agents as tools?'\n",
      " 'How can I create a Django superuser using the Delphic application?'\n",
      " 'What is the default number of LLM calls required for the ListIndex?'\n",
      " 'What are the different vector stores supported by LlamaIndex for use as the storage backend for `VectorStoreIndex`?'\n",
      " 'What is the purpose of the \"router query engine\" in the LlamaIndex framework?'\n",
      " 'What storage backends are supported by LlamaIndex for persisting data?'\n",
      " 'What is the purpose of the `fetchDocuments` function in the `fetchDocuments.tsx` file in the React frontend?'\n",
      " 'What is the purpose of the `RefinePrompt` class in the LlamaIndex documentation?'\n",
      " \"What is the function used to retrieve the collections for the logged-in user in the Delphic project's frontend?\"\n",
      " 'What is the purpose of the ResponseEvaluator class in the LlamaIndex library?'\n",
      " 'What is the purpose of the Algovera tool built on top of LlamaIndex?'\n",
      " 'What is the purpose of the HyDE query transform in the LlamaIndex?'\n",
      " 'What are the three primary sections within the layout of the ChatView component?'\n",
      " 'What is the purpose of the `insert_terms` function in the LlamaIndex app?'\n",
      " 'What is the purpose of the `load_collection_model` function in the LlamaIndex documentation?'\n",
      " 'What is the purpose of the SQLTableNodeMapping object in the LlamaIndex documentation sample?']\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "unanswered_queries = np.array(question_dataset)[np.array(all_results) == 0]\n",
    "print(unanswered_queries)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The purpose of the `ReActAgent` is to instantiate an agent from a set of Tools. It can be initialized with other agents as tools by passing them as parameters to the `from_tools()` method.\n",
      "-----------------\n",
      "> Source (Doc id: 809486e9-ee91-42cb-a620-66c534c6890f): Sub question: What is the purpose of the ReActAgent?\n",
      "Response: The purpose of the ReActAgent is to instantiate an agent from a set of Tools....\n",
      "\n",
      "> Source (Doc id: 581d674e-6550-4a40-aa78-8936b53867e8): Sub question: How can the ReActAgent be initialized with other agents as tools?\n",
      "Response: Based on the given context information, it is not possible to initialize the ReActAgent with other agents as tools. The ReActAgent can only be initialized with a s...\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query('What is the purpose of the `ReActAgent` and how can it be initialized with other agents as tools?')\n",
    "print(str(response))\n",
    "print(\"-----------------\")\n",
    "print(response.get_formatted_sources(length=256))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is to provide a high-level interface for ingesting, indexing, and querying external data. It allows users to customize storage components such as document stores, index stores, and vector stores. The LoadAndSearchToolSpec also supports persisting data to various storage backends and offers different ways of querying a list index, including embedding-based queries and keyword filters.\n",
      "-----------------\n",
      "> Source (Doc id: 8ad62670-cacc-4a34-bb9b-00504904b04c): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is not mentioned in the given context information....\n",
      "\n",
      "> Source (Doc id: f5974d32-081a-4366-9e8a-ea9b4bccbdaf): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is to provide a high-level interface for ingesting, indexing, and querying...\n",
      "\n",
      "> Source (Doc id: 69d8cee2-a4eb-4183-acbe-34047f3b2649): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is to provide a tool specification that can be used to load and search dat...\n",
      "\n",
      "> Source (Doc id: be322260-e24b-4dc3-9540-8b973b1afecf): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: Based on the given context information, it is not possible to determine the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation. T...\n",
      "\n",
      "> Source (Doc id: f0f035ac-2c2d-4ba6-b84b-77d88bb7aaed): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: Based on the given context information, there is no mention of the \"LoadAndSearchToolSpec\" in the LlamaIndex documentation. Therefore, it is not po...\n",
      "\n",
      "> Source (Doc id: b0384b51-bb28-48dd-98fd-63f57d1c4791): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: Based on the given context information, there is no mention of the \"LoadAndSearchToolSpec\" in the LlamaIndex documentation. Therefore, the purpose ...\n",
      "\n",
      "> Source (Doc id: 9d0c096e-9049-4c36-9f1a-5b166c93a635): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is not mentioned in the given context information....\n",
      "\n",
      "> Source (Doc id: 65259c15-1ad2-4211-bd11-7c7b6f182cf8): Sub question: What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?\n",
      "Response: The purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation is to provide a tool that allows users to load and search for embeddings a...\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query('What is the purpose of the LoadAndSearchToolSpec in the LlamaIndex documentation?')\n",
    "print(str(response))\n",
    "print(\"-----------------\")\n",
    "print(response.get_formatted_sources(length=256))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "In this notebook, we covered several key topics!\n",
    "\n",
    "- setting up a sub-question query engine\n",
    "- generating a dataset of evaluation questions\n",
    "- evaluating responses for hallucination\n",
    "- evaluating responses for answer quality"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
